Keir Fraser [Fri, 13 Nov 2009 21:09:33 +0000 (21:09 +0000)]
remus: Add missing python __init__.py file
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Fri, 13 Nov 2009 17:21:13 +0000 (17:21 +0000)]
remus: Add missing unistd.h include from libcheckpoint.c
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Fri, 13 Nov 2009 17:02:25 +0000 (17:02 +0000)]
remus: Fix makefiles for indentation
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Fri, 13 Nov 2009 15:46:58 +0000 (15:46 +0000)]
Merge
Keir Fraser [Fri, 13 Nov 2009 15:38:57 +0000 (15:38 +0000)]
vtd: Make vtd faults dmesg more readable
This simple patch makes the VTd faults dmesg more readable and
helpful for debugging.
Signed-Off-By: Zhai Edwin <edwin.zhai@intel.com>
Keir Fraser [Fri, 13 Nov 2009 15:34:46 +0000 (15:34 +0000)]
Remus: support for network buffering
This currently relies on the third-party IMQ patch (linuximq.net)
being present in dom0. The plan is to replace this with a direct hook
into netback eventually.
This patch includes a pared-down and patched copy of ebtables to
install IMQ on a VIF.
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
Keir Fraser [Fri, 13 Nov 2009 15:34:03 +0000 (15:34 +0000)]
Remus: add control script to activate remus on a VM
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
Keir Fraser [Fri, 13 Nov 2009 15:33:37 +0000 (15:33 +0000)]
Remus: add python control extensions
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
Keir Fraser [Fri, 13 Nov 2009 15:31:45 +0000 (15:31 +0000)]
x86: Change the interface physdev_map_pirq to support new dom0.
It also keeps compatibility with old dom0.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Keir Fraser [Fri, 13 Nov 2009 15:31:16 +0000 (15:31 +0000)]
libxenlight: implement pci passthrough
This patch implements pci passthrough (hotplug and coldplug) in
libxenlight, it also adds three new commands to xl: pci-attach,
pci-detach and pci-list.
Currently flr on a device is done writing to
/sys/bus/pci/drivers/pciback/do_flr
pciback do_flr is present in both XCI and XCP 2.6.27 kernels.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Fri, 13 Nov 2009 15:30:24 +0000 (15:30 +0000)]
libxenlight: fix name to domid conversion
This patch makes sure that the domain name to domid conversion is
correct, cross referencing the information found on xenstore with the
list of running domains.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Thu, 12 Nov 2009 15:34:37 +0000 (15:34 +0000)]
x86: Disable spinlock checks temporarily while bringing a CPU online.
This is safe, as described in a code comment. Also fix up another
comment in start_secondary() while we're there.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 12 Nov 2009 13:15:40 +0000 (13:15 +0000)]
Don't assume vcpu_id's are contiguous in alloc_vcpu
When cpu hot-added, this assumption is broken because the hot-added
CPU may be brougt online by dom0 in arbitrary order. This patch avoids
making this assumption while still linking vcpus in ascending order of
identifier.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 12 Nov 2009 13:02:27 +0000 (13:02 +0000)]
Revert 20045:
db1890f07661 "Revert alloc_idle_vcpu()..."
The old implementation of alloc_idle_vcpu() is unnecessary since
arch-specific code ensures that a single idle domain supports NR_CPUS
vcpus, despite the usual limit of MAX_VIRT_CPUS for ordinary domains.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 12 Nov 2009 11:59:18 +0000 (11:59 +0000)]
x86: Remove non-CONFIG_HOTPLUG_CPU code, and general cleanup.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 12 Nov 2009 11:43:21 +0000 (11:43 +0000)]
Support physical CPU hot-add in xen hypervisor
This patch add CPU hot-add in system.
a) It mark all CPU as possible when booting, if CONFIG_HOTPLUG_CPU is
set. BTW, this will increase per_cpu area.
b) When a CPU is added through hypercall, the CPU will be marked as
present and offline, and the numa information is setup if numa is
supported. The CPU will be brought to online by dom0 online explicitly.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Thu, 12 Nov 2009 11:42:36 +0000 (11:42 +0000)]
Update pcpu_info hypercall interface
This patch change the XENPF_get_cpuinfo interface to pass only one
pcpu information each hypercall. Also, it replace
xenpf_resource_hotplug with XENPF_cpu_online/offline.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Thu, 12 Nov 2009 11:42:02 +0000 (11:42 +0000)]
A few trivial cleanups
Alphabetize object files and guest config options for better
readability. Also remove svm interrupt prototypes which do not
exist.
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Keir Fraser [Thu, 12 Nov 2009 11:40:44 +0000 (11:40 +0000)]
xend/xm: Add PSCSI_HBA class and DSCSI_HBA class to XenAPI
XenAPI (not xapi) has supported only LUN assignment mode for pvSCSI.
But at last, HOST assignment mode also is supported by these patches.
To support HOST assignment mode, these patches add PSCSI_HBA class
and DSCSI_HBA class to XenAPI.
Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
Keir Fraser [Thu, 12 Nov 2009 11:39:51 +0000 (11:39 +0000)]
PoD: Handle operations properly when domain is dying
No populate-on-demand activities should happen when a domain is dying.
Especially, it is a bug for memory to be added to the PoD cache when
d->is_dying is non-zero, since if this happens after the cache has
been emptied, these pages will never be freed. This may cause "zombie
domains" to linger.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Wed, 11 Nov 2009 13:11:44 +0000 (13:11 +0000)]
blktap2: Remove gnu89-inline option from CFLAGS
Not supported by older versions of gcc.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 10 Nov 2009 13:04:45 +0000 (13:04 +0000)]
Mark CPU present when it is detected
Currently a CPU is marked as present only after it has been kicked off
successfully, i.e. before the CPU is brought up, it is not
present. This patch try to mark CPU as present when it is detected
(either through MPS table or ACPI). If it can't be brought up
successfully, it will be marked as non-present again. This change is
mainly for CPU hot-plug. As discussed, we'd take two step for physical
CPU hot-add. A CPU is firstly marked as present, and later will bring
as online.
Also, In smp_boot_cpus(), xen need only scan all present CPU, and no
need to loop from 0... NR_CPUS. With this change, the bios_cpu_apicid
is not needed anymore.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Tue, 10 Nov 2009 13:03:42 +0000 (13:03 +0000)]
Hypercall to expose physical CPU information.
It also make some changes to current cpu online/offline logic:
1) Firstly, cpu online/offline will trigger a vIRQ to dom0 for status
changes notification.
2) It also add an interface to platform operation to online/offline
physical CPU. Currently the cpu online/offline interface is in sysctl,
which can't be triggered in kernel. With this change, it is possible
to trigger cpu online/offline in dom0 through sysfs interface.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Tue, 10 Nov 2009 13:01:09 +0000 (13:01 +0000)]
tools: Make build again on netbsd
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 9 Nov 2009 22:41:23 +0000 (22:41 +0000)]
libxl: Call to open() must specify mode with O_CREAT.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 9 Nov 2009 22:30:21 +0000 (22:30 +0000)]
unlzma: Remove 'inline' decl from non-static function.
Breaks the build with some versions of gcc.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 9 Nov 2009 20:43:40 +0000 (20:43 +0000)]
x86: Fix clip_to_limit().
There are issues in updating the e820 map in the middle of a loop that
iterates over it. For example, after memmove(&e820.map[i],
&e820.map[i+1], ...), the original e820.map[i+1] become current
e820.map[i] but the next loop count is i+1, so the original
e820.map[i+1] will be skipped.
Fix and clarify the code by making a double loop.
Original bug discovery and fix by Xiao Guangrong <ericxiao.gr@gmail.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 9 Nov 2009 20:06:48 +0000 (20:06 +0000)]
cmdline_parse_early: fix parse 'edd=' option
If 'edd='is default, it should decrease "opt_edd" not "opt_edid"
Signed-off-by: Xiao Guangrong <ericxiao.gr@gmail.com>
Keir Fraser [Mon, 9 Nov 2009 20:05:43 +0000 (20:05 +0000)]
e820: fix e820_change_range_type()
In below case, e820_change_range_type() will return success:
[s, e] is in the middle of [rs, re] and e820->nr_map+1 >=
ARRAY_SIZE(e820->map) actually, it's failed, so this patch fix it
Signed-off-by: Xiao Guangrong <ericxiao.gr@gmail.com>
Keir Fraser [Mon, 9 Nov 2009 19:54:28 +0000 (19:54 +0000)]
libxenlight: initial libxenlight implementation under tools/libxl
Signed-off-by: Vincent Hanquez <Vincent.Hanquez@eu.citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Mon, 9 Nov 2009 19:45:06 +0000 (19:45 +0000)]
blktap2: add remus driver
Blktap2 port of remus disk driver. Backwards compatable with blktap1
implementation.
Signed-off-by: Ryan O'Connor <rjo@cs.ubc.ca>
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
Keir Fraser [Mon, 9 Nov 2009 19:41:16 +0000 (19:41 +0000)]
Remus: Fixup for tap:tapdisk syntax in remus uname
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
Keir Fraser [Mon, 9 Nov 2009 19:40:48 +0000 (19:40 +0000)]
blktap2: only open driver stack once
Currently blktap2 opens a driver stack, closes it, and re-opens
it. This causes problems with our remus driver: the primary may
connect to the backup in between the first and second open.
This is a temporary fix.
Signed-off-by: Ryan O'Connor <rjo@cs.ubc.ca>
Keir Fraser [Mon, 9 Nov 2009 19:40:14 +0000 (19:40 +0000)]
blktap2: configurable driver chains
Blktap2 allows block device drivers to be layered to create more
advanced virtual block devices. However, composing a layered driver is
not exposed to the user. This patch fixes this, and allows the user to
explicitly specify a driver chain when starting a tapdisk process,
using the pipe character ('|') to explicitly seperate layers in a
blktap2 configuration string.
for example, the command:
~$ tapdisk2 -n "log:|aio:/path/to/file.img"
will create a blktap2 device where read and write requests are passed
to the 'log' driver, then forwarded to the 'aio' driver.
Signed-off-by: Ryan O'Connor <rjo@cs.ubc.ca>
Keir Fraser [Mon, 9 Nov 2009 19:19:27 +0000 (19:19 +0000)]
Remus: Make checkpoint buffering HVM-aware
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
Keir Fraser [Mon, 9 Nov 2009 19:17:22 +0000 (19:17 +0000)]
Remus: Do bitmap scan word-by-word before bit-by-bit.
For sparse bitmaps and large domains this saves a lot of time.
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
Keir Fraser [Mon, 9 Nov 2009 19:16:48 +0000 (19:16 +0000)]
Remus: Do not bother with to_skip/to_fix bitmaps after the first final round.
Signed-off-by: Geoffrey Lefebvre <geoffrey@cs.ubc.ca>
Keir Fraser [Mon, 9 Nov 2009 19:16:19 +0000 (19:16 +0000)]
Remus: Buffer checkpoint data locally until domain has resumed execution.
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
Keir Fraser [Mon, 9 Nov 2009 19:15:34 +0000 (19:15 +0000)]
Remus: Initiate failover if a packet is not received every 500ms.
This breaks checkpoints at lower frequencies, and should be made
optional.
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
Keir Fraser [Mon, 9 Nov 2009 19:14:03 +0000 (19:14 +0000)]
Remus: Make xc_domain_restore loop until the fd is closed.
The tail containing the final PFN table, VCPU contexts and
shared_info_page is buffered, then the read loop is restarted.
After the first pass, incoming pages are buffered until the next tail
is read, completing a new consistent checkpoint. At this point, the
memory changes are applied and the loop begins again. When the fd read
fails, the tail buffer is processed.
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
Keir Fraser [Mon, 9 Nov 2009 19:06:25 +0000 (19:06 +0000)]
Remus: Add callbacks for suspend, postcopy and preresume in xc_domain_save.
This makes it possible to perform repeated checkpoints.
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
Keir Fraser [Mon, 9 Nov 2009 18:54:27 +0000 (18:54 +0000)]
x86, hvm: Make host TscInvariant CPUID flag visible to guest by default.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 9 Nov 2009 08:19:55 +0000 (08:19 +0000)]
x86_32: Respect e820 map when populating Xen heap.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 9 Nov 2009 08:03:30 +0000 (08:03 +0000)]
x86, cpuid: mask TSC invariant bit for PV and HVM domains if migration
is not disabled and TSC is not emulated
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 9 Nov 2009 07:52:27 +0000 (07:52 +0000)]
x86/dom0: support bzip2 and lzma compressed bzImage payloads
This matches functionality in the tools already supporting the same
for DomU-s.
Code taken from Linux 2.6.32-rc and adjusted as little as possible to
be usable in Xen.
The question is whether, particularly for non-Linux Dom0-s, plain ELF
images compressed by bzip2 or lzma should also be supported.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Thu, 5 Nov 2009 12:00:58 +0000 (12:00 +0000)]
xentop: Add two more VBD statistics
In addition to VBD read/write request#, add VBD read/write sector#
also. It makes VBD throughput observation easier. As the method to get
such info is OS dependent, just Linux version code is added.
Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
Keir Fraser [Wed, 4 Nov 2009 22:32:01 +0000 (22:32 +0000)]
xc_resume: fix modify_returncode when host width != guest width
Also improve checking in xc_domain_resume_any().
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 4 Nov 2009 18:14:02 +0000 (18:14 +0000)]
Keir Fraser [Tue, 3 Nov 2009 12:41:54 +0000 (12:41 +0000)]
xen passthrough: fix recent regressions
This patch fixes the recent regressions pointed out by Dexuan, keeping
pci passthrough working with stubdom too. In particular calling
device_create when pci_state == 'Initialising' is a mistake because
the state is always Initialising when attaching a device while
device_create has too be called only when the pci backend is missing.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Tue, 3 Nov 2009 12:40:28 +0000 (12:40 +0000)]
x86: improve reporting through XENMEM_machine_memory_map
Since Dom0 derives machine address ranges usable for assigning PCI
device resources from the output of this sub-hypercall, Xen should
make
sure it properly reports all ranges not suitable for this (as either
reserved or unusable):
- RAM regions excluded via command line option
- memory regions used by Xen itself (LAPIC, IOAPICs)
While the latter should generally already be excluded by the BIOS
provided E820 table, this apparently isn't always the case at least
for IOAPICs, and with Linux having got changed to account for this it
seems to make sense to also do so in Xen.
Generally the HPET range should also be excluded here, but since it
isn't being reflected in Dom0's iomem_caps (and can't be, as it's a
sub-page range) I wasn't sure whether adding explicit code for doing
so would be reasonable.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Tue, 3 Nov 2009 09:33:22 +0000 (09:33 +0000)]
x86: Clean up APIC local timer handling.
1. Writing TMICT=0 disables the timer. Use this fact to simplify and
improve reprogram_timer(). In particular, we always write TMICT, and
write zero when we do not need a timer interrupt.
2. In HPET broadcast timer handler, set TMICT=0 when we mask the APIC
local timer. May as well do this early, before entering deep sleep.
3. In HVM-guest APIC emulation, disable the emulated local timer when
the guest sets TMICT=0. Previously we would issue an immediate
one-shot interrupt.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 3 Nov 2009 08:40:40 +0000 (08:40 +0000)]
vmx: Disable vPMU feature by default
Signed-off-by: Shan Haitao <haitao.shan@intel.com>
Keir Fraser [Tue, 3 Nov 2009 08:39:21 +0000 (08:39 +0000)]
Linux vbd hotplug: Speed up finding a loopback device
- Use the device and inode information provided by losetup to find
if the vbd backing file is in use on another vbd.
- Use losetup to find a free loopback device.
Signed-off-by: Gary Grebus <gary.grebus@oracle.com>
Keir Fraser [Tue, 3 Nov 2009 08:38:55 +0000 (08:38 +0000)]
Linux vbd hotplug: Avoid "leaked" loopback devices
Avoid races between hotplug "add" and "remove" leading to "leaked"
loopback devices.
- Don't setup loopback device if xend is no longer waiting for the
vbd.
- Use the lock file to avoid add/remove races.
Signed-off-by: Gary Grebus <gary.grebus@oracle.com>
Keir Fraser [Tue, 3 Nov 2009 08:37:52 +0000 (08:37 +0000)]
xen-hvmctx: add recently added gtsc_khz field to output
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Mon, 2 Nov 2009 09:38:34 +0000 (09:38 +0000)]
Fixes after addition of dummy_vcpu_info.
- Clean initialisation of new vcpu_info in map_vcpu_info() if the
vcpu was previously using the shared dummy structure.
- Don't allow a vcpu to run with teh shared dummy info structure, as
no good can come of it.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 29 Oct 2009 14:48:28 +0000 (14:48 +0000)]
Extend the max vcpu number for HVM guest.
- Originally the max vcpu number for HVM guest is 32, this patch
extend the number to 128 on x86_64 hypervisor. (For i386 hypervisor,
the max vcpu number is still 32).
- This patch extends the mp-table size to fit more vcpus.
- HVM PV driver should call VCPUOP_register_vcpu_info hypercall to
initialize the vcpu info if the vcpu number is more than 32.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 29 Oct 2009 14:05:46 +0000 (14:05 +0000)]
AMD IOMMU: remove a BUG_ON condition, to allow boot
Signed-off-by: Wei Wang <wei.wang2@amd.com>
Keir Fraser [Thu, 29 Oct 2009 14:04:45 +0000 (14:04 +0000)]
stubdom: make stubdom-dm exit properly
The built-in bash command wait should be able to take a pid argument
and just wait for the specified process to die, but it currently has a
bug and what actually does is waiting for the death of all the
children. For this reason the stubdom-dm script doesn't exit properly
after stubdom destruction. This patch solves the issue spawning only
one child, removing the sleep subprocess workaround that was used to
create a usable stdin for "xm console" and replacing it with a fifo.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Thu, 29 Oct 2009 14:03:56 +0000 (14:03 +0000)]
Extend max vcpu number for HVM guest
Reduce size of Xen-qemu shared ioreq structure to 32 bytes. This
has two advantages:
1. We can support up to 128 VCPUs with a single shared page
2. If/when we want to go beyond 128 VCPUs, a whole number of ioreq_t
structures will pack into a single shared page, so a multi-page
array will have no ioreq_t straddling a page boundary
Also, while modifying qemu, replace a 32-entry vcpu-indexed array
with a dynamically-allocated array.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 29 Oct 2009 11:50:09 +0000 (11:50 +0000)]
Update .hgignore list
Keir Fraser [Thu, 29 Oct 2009 11:14:54 +0000 (11:14 +0000)]
Point per-vcpu vcpu_info at a dummy structure by default, avoiding
need for scattered NULL-pointer checks.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 29 Oct 2009 08:34:51 +0000 (08:34 +0000)]
minios: xmalloc and realloc fixes
- xmalloc currently faults if xmalloc_new_page fails due to OOM
- realloc treats xmalloc_hdr.size as the size of just the data region
rather than the total size of data region + headers + padding.
From: James Pendergrass <James.Pendergrass@jhuapl.edu>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 28 Oct 2009 17:27:47 +0000 (17:27 +0000)]
iommu: Do not initialise global vars explicitly to zero.
Unnecessary and prevents them being allocated in BSS rather than data.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 28 Oct 2009 17:27:09 +0000 (17:27 +0000)]
vtd: Simplify acpi_dmar_init().
No need to check force_iommu, as that is done later in common code.
Also no need to clear iommu_enabled as again this gets checked
later. Furthermore doing it here, from a non-Intel-specific callsite,
breaks other vendors' IOMMU support.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 28 Oct 2009 17:08:26 +0000 (17:08 +0000)]
AMD IOMMU: Use global interrupt remapping table by default
Using a global interrupt remapping table shared by all devices has
better compatibility with certain old BIOSes. Per-device interrupt
remapping table can still be enabled by using a new parameter
"amd-iommu-perdev-intremap".
Signed-off-by: Wei Wang <wei.wang2@amd.com>
Keir Fraser [Wed, 28 Oct 2009 10:59:55 +0000 (10:59 +0000)]
xend: disallow ! as a sxp separator
Signed-off-by: Jim Fehlig <jfehlig@novell.com>
Keir Fraser [Wed, 28 Oct 2009 10:59:14 +0000 (10:59 +0000)]
x86: vioapic: fix remote irr bit setting for level triggered interrupts
Clear all entries' remote irr bits once the RTE entries' vector field
match with EOI message's vector.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Keir Fraser [Wed, 28 Oct 2009 10:56:39 +0000 (10:56 +0000)]
scheduler: small csched_cpu_pick() adjustments
When csched_cpu_pick() decides to move a vCPU to a different pCPU, so
far in the vast majority of cases it selected the first core/thread of
the most idle socket/core. When there are many short executing
entities, this will generally lead to them not getting evenly
distributed (since primary cores/threads will be preferred), making
the need for subsequent migration more likely. Instead, candidate
cores/threads should get treated as symmetrically as possible, and
hence this changes the selection logic to cycle through all
candidates.
Further, since csched_cpu_pick() will never move a vCPU between
threads of the same core (and since the weights calculated for
individual threads of the same core are always identical), rather than
removing just the selected pCPU from the mask that still needs looking
at, all siblings of the chosen pCPU can be removed at once without
affecting the outcome.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Wed, 28 Oct 2009 10:55:53 +0000 (10:55 +0000)]
x86: deny access to the ACPI PM timer I/O port range for Dom0
Also move the declaration of pmtmr_ioport to a suitable header file.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Wed, 28 Oct 2009 10:55:17 +0000 (10:55 +0000)]
Boot parameter definition adjustments
Consolidate the various attributes into macros, and tell the compiler
not to needlessly waste spec for aligning strings used at most once.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Wed, 28 Oct 2009 10:54:50 +0000 (10:54 +0000)]
Miscellaneous data placement adjustments
Make various data items const or __read_mostly where
possible/reasonable.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Wed, 28 Oct 2009 10:54:20 +0000 (10:54 +0000)]
irq cleanup
Make IRQ related data const or __read_mostly where possible/reasonable,
use platform_legacy_irq() where feasible, and remove the now unused
definition of vector_to_irq().
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Tue, 27 Oct 2009 12:52:57 +0000 (12:52 +0000)]
xsm: Add support for Xen device policies
Add support for Xen ocontext records to enable device polices. The
default policy will not be changed and instructions have been added to
enable the new functionality. Examples on how to use the new policy
language have been added but commented out. The newest version of
checkpolicy (>= 2.0.20) and libsepol (>= 2.0.39) is needed in order to
compile it. Devices can be labeled and enforced using the following
new commands; pirqcon, iomemcon, ioportcon and pcidevicecon.
Signed-off-by : George Coker <gscoker@alpha.ncsc.mil>
Signed-off-by : Paul Nuzzi <pjnuzzi@tycho.ncsc.mil>
Keir Fraser [Tue, 27 Oct 2009 12:52:14 +0000 (12:52 +0000)]
xend: Add keymap to vfb config for hvm guests
From: Jim Fehlig <jfehlig@novell.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 26 Oct 2009 13:33:38 +0000 (13:33 +0000)]
x86: IRQ Migration logic enhancement.
To programme MSI's addr/vector safely, delay irq migration
operation before acking next interrupt. In this way, it should
avoid inconsistent interrupts generation due to non-atomic writing
addr and data registers about MSI.
Port the logic from Linux and tailor it for Xen.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Keir Fraser [Mon, 26 Oct 2009 13:26:43 +0000 (13:26 +0000)]
x86: Small simplification to get_page_from_l1e().
No need for separate top-level check for page owner being NULL: this
can be folded into the case that page owner is not who the caller
expected (caller will never expect NULL owner).
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 26 Oct 2009 13:19:33 +0000 (13:19 +0000)]
hvm: Clean up EPT/NPT 'nested page fault' handling.
Share most of the code.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 26 Oct 2009 12:20:07 +0000 (12:20 +0000)]
xend, passthrough: Small fix to find_all_the_multi_functions()
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 26 Oct 2009 12:18:50 +0000 (12:18 +0000)]
shadow dirty-VRAM: avoid multiple remove_all_mappings calls.
sh_remove_all_mappings() will walk roughly half of the shadow L1
tables for each MFN it's called with; calling it for every MFN in a
guest's framebuffer can be _very_ expensive, especially with the
shadow lock held across the whole operation. Avoid that by just
blowing away all the shadows.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Keir Fraser [Fri, 23 Oct 2009 09:15:17 +0000 (10:15 +0100)]
x86: Enable TSC_RELIABLE for AMD servers
Except for a published BIOS errata on family 11h processors,
all AMD servers that have the Invariant TSC bit set have
a reliable TSC so Xen should not write to the TSC.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
Keir Fraser [Fri, 23 Oct 2009 09:13:52 +0000 (10:13 +0100)]
x86 ept: ignore guest writes to read only memory regions or memory
holes in EPT.
This patch prevents domain crash when running memtest86 with EPT.
Signed-off-by: Xin Li <xin.li@intel.com>
Keir Fraser [Fri, 23 Oct 2009 09:13:22 +0000 (10:13 +0100)]
vtd: interrupt remapping fix
Fix the error of translation from int remapping table entry(IRTE) to
MSI msg. This error may write wrong IRTE back to the VTd hardware, and
block physical interrupts.
Signed-Off-By: Zhai Edwin <edwin.zhai@intel.com>
Keir Fraser [Fri, 23 Oct 2009 09:12:52 +0000 (10:12 +0100)]
xsm: Corrected check in io_has_perm()
Fix the check in io_has_perm() to correctly check the start and end
of I/O Memory.
Signed-off-by : George Coker <gscoker@alpha.ncsc.mil>
Signed-off-by : Paul Nuzzi <pjnuzzi@tycho.ncsc.mil>
Keir Fraser [Fri, 23 Oct 2009 09:11:52 +0000 (10:11 +0100)]
x86: Fix RevF detection in powernow.c
The PowerNow! driver does not support RevF and earlier parts.
The current code checks for RevF processors in a function that
is not called. Change the code path so that RevF processors
are detected and the driver fails registration.
Also fix cpufreq_add_cpu() to handle unsuccessful registration.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Keir Fraser [Fri, 23 Oct 2009 09:09:37 +0000 (10:09 +0100)]
blktap2: Fix sysfs handling of blktap2
The pause and unpause paths are currently broken due to a missing
slash. I took advantage of the opportunity to remove code repetition,
repeated strings that should point to the proper constants, etc
From: Andres Lagar Cavilla <andreslc@cs.toronto.edu>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Fri, 23 Oct 2009 09:05:15 +0000 (10:05 +0100)]
xsm: Add getenforce and setenforce functionality to tools
This patch exposes the getenforce and setenforce functionality for the
Flask XSM module.
Signed-off-by : Machon Gregory <mbgrego@tycho.ncsc.mil>
Signed-off-by : George S. Coker, II <gscoker@alpha.ncsc.mil>
Keir Fraser [Fri, 23 Oct 2009 09:04:03 +0000 (10:04 +0100)]
passthrough/stubdom: clean up hypercall privilege checking
This patch adds securty checks for pci passthrough related hypercalls
to enforce that the current domain owns the resources that it is about
to remap. It also adds a call to xc_assign_device to xend and removes
the PRIVILEGED_STUBDOMS flags.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Fri, 23 Oct 2009 09:02:09 +0000 (10:02 +0100)]
blktap: Fix check_sharing() in blktapctrl
check_sharing() in blktapctrl does not work.
- It accesses to xenstore by using wrong paths.
- It compares image paths including image types.
- It misjudges a return value of strcmp().
This patch fixes those mistakes.
Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
Keir Fraser [Fri, 23 Oct 2009 09:00:22 +0000 (10:00 +0100)]
libxc: fix a few memory leaks
running qemu with valgrind I found I couple of small memory leaks in
libxc, this patch fixes them.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Fri, 23 Oct 2009 08:59:45 +0000 (09:59 +0100)]
minios: Optimize mmap(open("/dev/mem"))
Set map_frames_ex's stride parameter to 0 and increment to 1 to avoid
building an explicit list of mfns.
Signed-Off-By: Samuel Thibault <samuel.thibault@ens-lyon.org>
Keir Fraser [Wed, 21 Oct 2009 15:08:28 +0000 (16:08 +0100)]
stubdom: mmap on /dev/mem support
This patch adds support for mmap on /dev/mem in a stubdom; it is
secure because it only works for memory areas that have been
explicitly allowed by the toolstack (xc_domain_iomem_permission).
Incidentally this is all that is needed to make MSI-X passthrough work
with stubdoms.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 21 Oct 2009 15:07:37 +0000 (16:07 +0100)]
x86: Initialize the affinity field after assigning the vector.
To avoid strange output from debug-key "i", desc->affinity should
be the subset of the cfg->domain basically, so copy cfg->domain to
desc->affinity after assigning vector for the irq..
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Keir Fraser [Wed, 21 Oct 2009 15:06:30 +0000 (16:06 +0100)]
Keir Fraser [Wed, 21 Oct 2009 15:05:05 +0000 (16:05 +0100)]
Remove unused XEN_DOMINF_cpu{mask,shift} definitions.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 21 Oct 2009 08:23:10 +0000 (09:23 +0100)]
xend: bootable flag of VBD not always of type int
1. Calling VDB.set_bootable(True) results in string 'True' in managed
config file. After xend restart, conversion int(bootable) in
server/blkif.py fails.
2. selection of bootable disks in XendDomainInfo.py requires
type(bootable) == int not str, otherwise all disks are taken as
bootable.
This patch converts the bootable flag always to int.
Signed-off-by: Lutz Dube <Lutz.Dube@ts.fujitsu.com>
Keir Fraser [Wed, 21 Oct 2009 08:21:01 +0000 (09:21 +0100)]
xmalloc_tlsf: Fall back to xmalloc_whole_pages() if xmem_pool_alloc() fails.
This was happening for xmalloc request sizes between 3921 and 3951
bytes. The reason being that xmem_pool_alloc() may add extra padding
to the requested size, making the total block size greater than a
page.
Rather than add yet more smarts about TLSF to _xmalloc(), we just
dumbly attempt any request smaller than a page via xmem_pool_alloc()
first, then fall back on xmalloc_whole_pages() if this fails.
Based on bug diagnosis and initial patch by John Byrne <john.l.byrne@hp.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 21 Oct 2009 07:51:10 +0000 (08:51 +0100)]
stubdom: implement pci coldplug
This patch fixes the circular dependency problem in the toolstack that
prevented pci coldplug from working with stubdoms: after creating the
stubdom we wait for it to be properly initialized before going
further. We release the domain lock while we wait.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 21 Oct 2009 07:50:23 +0000 (08:50 +0100)]
x86: MSI: Mask/unmask msi irq during the window which programs msi.
When program msi, it has to mask it first, otherwise, it
may generate inconsistent interrupts. According to spec,
if not masked, the interrupt generation behaviour is undefined.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Keir Fraser [Tue, 20 Oct 2009 13:36:01 +0000 (14:36 +0100)]
Obtain Linux kernel via git protocol by default (GIT_HTTP=y overrides)
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>